Inferring language change from computer corpora: Some methodological problems1
نویسنده
چکیده
As the number and size of computer corpora grow, linguistic researchers are increasingly using them to study changes in language over time. Comparing usage at one point in time with usage at a later or an earlier period seems a stunningly simple and Sausurreanly impeccable method of studying language change. Needless to say the reality is rather different. This paper identifies some of the methodological problems encountered in using computer corpora to describe changes in sexist usages in New Zealand English (NZE) over a twenty-five year period.
منابع مشابه
Extending the possibilities of corpus-based research on English in the twentieth century: a prequel to LOB and FLOB
This paper explains the rationale for a new corpus being assembled at Lancaster University to complement the existing Brown ‘family’ of corpora; that is, English language corpora modelled on the original Brown University corpus, such as LOB, Frown, FLOB, Wellington, etc. The purpose of the new corpus, called Lancaster1931, is to extend the chronological span of these corpora into the first half...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملChoices over time : methodological issues in investigating current change 1
The fact that English is changing is immediately apparent to a modern reader of, say, 18th or 19th century literature, or indeed to a teenager speaking to an elderly relative. However, as Mair (2006) points out, anecdotal evidence for linguistic change is unreliable. The systematic study of language change requires large, evenly balanced, and reliably annotated corpora with texts sampled over a...
متن کاملCombining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms
Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorizati...
متن کامل